Our proposal has two Blockchain systems running in parallel. One of them is in charge of storing the data generated by
the defined indicators (Event Blockchain). Hence, this Blockchain is connected to the main components of the Big Data
ecosystem: the Big Data Application Provider (BDAP) and the Big Data Framework Provider (BDFP). This first Blockchain
system should store all the data related to the actions performed on the data, i.e., the typical Big Data services:
collection, preparation, analysis, visualization and access control to the data. Each block of this system should have
a similar structure: ID, user ID (the user that performed the operation), role of the user, timestamp, type of
operation performed, and the data affected by the operation. Each operation has its own characteristics that must be
taken into account, however; for example, the collection can also store the information about the data source from
which the data is being stored in the Big Data, or the visualization should consider that it is possible to infer
sensitive information from anonymized data by performing a large number of queries on the same dataset. Figure 1 shows
a UML diagram with a possible implementation of the Event Blockchain, displaying how it is connected to the Big Data
ecosystem.
In parallel, a second Blockchain system stores all the data related to the incidents already identified (Incident
Blockchain). In order to do that, this Blockchain is connected to a component that monitors the compliance of the
requirements of the Big Data ecosystem. This means that when an incident occurs it is detected and stored in the
Incident Blockchain. However, it is necessary to store more incident data that could help in the recovery of the
system. To that end, when an incident is identified, all event data that may be related to it will be copied to this
second system by using a proxy between both Blockchain systems. Once all the data are collected in the Blockchain
systems, it is time to carry out the analysis of them. This implies the need to implement an intelligent system that
utilizes Machine Learning techniques to obtain value from these data, to identify the reasons why the incident
happened, for example, or to try to predict the occurrence of a new incident by analyzing the events that happen in
real time. There are established processes that can help to conduct that task, such as Veeramachaneni's et al. work,
which proposes a series of steps to teach a Big Data system to detect attacks: first, an unsupervised learning
algorithm is performed to detect outliers that are then analyzed by cybersecurity experts; the result of this analysis
will be used as a set of data to train the first unsupervised learning algorithm, so that different iterations are
executed in the effort to improve prediction. Moreover, all this data should be visualized by means of a dashboard that
is checked by the Incident Response Team. Figure 2 depicts the interaction between the Big Data ecosystem and the
Blockchain systems. This figure shows a summarized version of the different components of the SRA for Big Data
ecosystems. As explained previously, this activity is not carried out sequentially, but rather in parallel to the rest
of the activities of this phase.
|